feat(daemon+sdk): cross-client real-time sync completeness#4484
Conversation
Audit (cross-client sync, 2026-05-24) of the daemon's per-session EventBus fan-out surfaced gaps where one client's actions did not propagate to other SSE-subscribed clients on the same session. This commit closes five of them — all bridge-layer fixes, no agent-side changes — with regression tests covering the new sentinel frame. ## 1. user_message_chunk echo on the interactive prompt path The agent's `Session#executePrompt` (Session.ts:556+) forwards the prompt straight to the LLM without emitting `user_message_chunk` to the session bus. The cron path (Session.ts:1402) and HistoryReplayer (HistoryReplayer.ts:65) DO emit it; only the interactive path was the outlier. Result: when client A sent a prompt, other clients on the same session saw only the agent's reply, never the input — they had to wait for a session reload to learn what A had asked. Fix: `echoPromptToSessionBus` helper publishes one `user_message_chunk` per content block of the incoming `PromptRequest`, stamped with the envelope-level `originatorClientId` so SDK consumers with `suppressOwnUserEcho: true` filter the echo on the originator's UI. Multi-modal blocks (image / audio / resource) pass through verbatim for future-compat with Core's multi-modal echo work. `_meta.source: 'bridge-echo'` distinguishes bridge-synthesized echoes from agent-emitted content. Used today only for diagnostic visibility; becomes load-bearing once SDK-side dedup matures (deferred follow-up). ## 2. prompt_cancelled broadcast in cancelSession `bridge.cancelSession` forwarded the ACP cancel notification to the agent and resolved pending permissions, but did NOT publish any event on the session bus. Other clients learned that A had cancelled only by absence of further `agent_message_chunk` frames — heuristic and late. Fix: emit a `prompt_cancelled` envelope before the ACP forward so peer clients see the cancel as a first-class event. Envelope-level `originatorClientId` identifies the cancelling client (the one calling `POST /cancel`). Permission-resolution events generated by the subsequent `cancelPendingForSession` continue to omit an originator (those are system-initiated wind-downs, not user-voted). ## 3. replay_complete sentinel in EventBus.subscribe A consumer attaching via `Last-Event-ID: <n>` had no positive signal when the replay loop drained — they had to heuristically time out the catch-up spinner. The state-resync path already had a synthetic `state_resync_required` frame; the success path lacked parity. Fix: emit an id-less `replay_complete` synthetic frame at the end of the replay loop (same pattern as `client_evicted` / `state_resync_required` — no slot in the per-session monotonic sequence). Fires both when replay actually delivered frames AND when there was nothing to replay (empty ring), so the consumer always sees the transition from "catching up" to "live". `data.replayedCount` is the actual count of force-pushed frames (not derived from id arithmetic, which would over-count when the state-resync path leaves a hole before the ring's earliest id). 3 EventBus test cases updated to assert the sentinel frame ordering. ## 4. originatorClientId on session_metadata_updated envelope `updateSessionMetadata` resolved the trusted client id for validation (`resolveTrustedClientId(entry, context.clientId)`) but did not stamp it on the broadcast envelope. UIs couldn't attribute the rename to a specific client. Sibling events (`model_switched`, `approval_mode_changed`) all stamp envelope-level `originatorClientId`; this brings the metadata broadcast to parity. ## 5. originatorClientId on session_closed envelope `session_closed` carried the closing client in `data.closedBy` only, but every other event the bridge publishes uses the envelope-level `originatorClientId` field. Added the envelope-level stamp (kept `data.closedBy` for back-compat) so SDK consumers can read the attribution from the same place across all event types. ## Out-of-scope (deferred to follow-up) The cross-client sync audit also surfaced 3 items that require larger design discussion: - **In-session ACP `setModel` bus emit** — `Session.ts#setModel` calls `config.switchModel` directly without going through the bridge's publish path. Fixing this requires a new ACP sessionUpdate type (`current_model_update`, parallel to existing `current_mode_update`) or a side-channel callback from agent to bridge. - **Workspace-wide broadcast of non-persisted approval-mode changes** — current behavior only broadcasts workspace-wide on `persist=true`; the design intent of the persist flag relative to multi-client visibility needs alignment. - **Serialize `setSessionApprovalMode` through a queue** — analogous to `entry.modelChangeQueue` for `setSessionModel`. Race-condition fix. - **Reconcile `permission_resolved.originatorClientId` semantics** — it currently carries the VOTER's clientId; `permission_request` carries the prompt originator. SDK consumers need to special-case the type. Either change to consistent semantics or add a separate `voterClientId` field. These are tracked as follow-ups, not in this PR. ## Validation | | | |---|---| | Bridge tests | 291/291 pass | | eventBus tests | 105/105 pass (3 updated) | | TypeScript | clean |
📋 Review SummaryThis PR addresses five cross-client real-time sync gaps identified in a daemon_mode_b_main audit, implementing mechanical bridge-layer fixes for multi-client SSE propagation. The changes are well-documented, follow existing patterns, and include appropriate test updates. Overall assessment: solid implementation with excellent documentation, though a few edge cases and consistency improvements should be addressed before merging. 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
wenshao
left a comment
There was a problem hiding this comment.
Additional follow-up items (not in this PR's diff):
- SDK normalizer gap:
prompt_cancelledandreplay_completeare not handled inpackages/sdk-typescript/src/daemon/ui/normalizer.ts— both fall through todefault:producingdebugevents. SDK consumers cannot programmatically react to these events without a follow-up SDK change. - sendPrompt abort handler: the
onAbortpath (originator SSE disconnect) does not publishprompt_cancelledto the bus, whilecancelSessiondoes. Same cross-client sync gap on the sibling cancel path.
— qwen3.7-max via Qwen Code /review
| // pass through verbatim (the agent's Core multimodal echo is a | ||
| // separate follow-up tracked in PR #4353 §D); for now the | ||
| // common text path is the immediate fix. | ||
| echoPromptToSessionBus(entry, normalized, originatorClientId); |
There was a problem hiding this comment.
[Suggestion] Ghost echo on prompt forward failure
echoPromptToSessionBus fires before entry.connection.prompt(). If the prompt forward rejects (ACP child internal error, transport glitch), the echo events are already in the ring buffer. Peer SSE subscribers see user_message_chunk followed by permanent silence — no error event, no diagnostic in daemon logs.
The ordering is intentional (echo must precede agent response), but the failure path leaves orphaned events. Consider publishing a compensating event on forward failure:
const promptPromise = entry.connection
.prompt(normalized)
.catch((err) => {
try {
entry.events.publish({
type: 'session_update',
data: {
sessionId: req.sessionId,
update: {
sessionUpdate: 'prompt_error',
error: String(err),
_meta: { serverTimestamp: Date.now(), source: 'bridge-prompt-failure' },
},
},
...(originatorClientId ? { originatorClientId } : {}),
});
} catch { /* bus closed */ }
throw err;
})— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Deferring this one. Compensating for a prompt-forward failure means introducing a new prompt_error event type (+ SDK normalizer case + reducer handling + tests) — broader than this PR's scope. The orphaned-echo window is also narrow: the originator still gets the HTTP error on its POST /prompt, so only passive peers see input-then-silence, and a subsequent real tool_call_update / agent_message_chunk quiescence is the existing signal. Tracked for batch follow-up in docs/qwen-daemon/cross-client-sync-followups.md (C3). Leaving this thread open so it stays visible until the follow-up lands.
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] SDK does not recognize the new event types
DAEMON_KNOWN_EVENT_TYPE_VALUES in packages/sdk-typescript/src/daemon/events.ts does not include prompt_cancelled or replay_complete. The normalizer routes them to debug events and the reducer silently drops them (reduceDaemonSessionEvent returns base unchanged for unknown types). SDK-based consumers (VSCode extension, web UI) cannot programmatically react to these events until the SDK is updated. Consider adding them to the known-event registry in a companion PR, or explicitly documenting in the PR description that SDK-side handling is a required follow-up.
— claude-opus-4-7 via Claude Code /qreview
Adds two integration tests for the cross-client sync fix: - "echoes user_message_chunk to ALL session subscribers": two SSE subscribers (A + B) on the same session; client A sends a prompt; asserts BOTH receive the user_message_chunk with the originator stamp + `_meta.source: 'bridge-echo'`. This is the core multi-client property — a prompt from one client is visible to every subscriber, not just the originator. - "echoes one user_message_chunk per content block (multi-modal)": a two-block prompt (text + resource_link) produces two echo frames in order. Validates the bridge-layer echo end-to-end through the real EventBus + subscribeEvents path, not just a unit of the helper.
…, hardening Round-2 review of the cross-client sync work. Adds the sibling cancel path, SDK-side recognition of the two new event types so consumers can react instead of debug-dropping, plus hardening + test coverage flagged in review. ## Bridge (acp-bridge) - Abort-path cancel broadcast: the `sendPrompt` `onAbort` closure (originator SSE disconnect — the most common cancel trigger: tab close, network drop, laptop sleep) previously resolved permissions + forwarded ACP cancel WITHOUT publishing `prompt_cancelled`. Only the explicit `cancelSession` route emitted it. Extracted a shared `broadcastPromptCancelled` helper, called from both paths. - echoPromptToSessionBus hardening: read `req.prompt` directly (no `unknown` cast so a future SDK type change is a compile error); cap echoed blocks at MAX_ECHO_CONTENT_BLOCKS (256) to bound fan-out + ring pressure; corrected the non-text comment (all ContentBlock variants are published verbatim, not "metadata-only"). - Documented prompt_cancelled's "cancel requested, not confirmed" semantic and the intentional unconditional broadcast. ## SDK (sdk-typescript) The bridge now produces `prompt_cancelled` and `replay_complete`. Without SDK recognition they fall through the normalizer default to `debug` and the reducer drops them — consumers (VSCode ext, web UI, React CLI) can't react. Added: - both types to DAEMON_KNOWN_EVENT_TYPE_VALUES - normalizer cases → typed UI events `prompt.cancelled` / `session.replay_complete` - DaemonUiPromptCancelledEvent + DaemonUiReplayCompleteEvent types, union + barrel re-exports - reducer: prompt.cancelled runs propagateCancellationToInFlightTools (clears peer-cancelled tool spinners, same idempotent path as assistant.done(cancelled)); session.replay_complete no-ops on blocks - terminal projection cases for both - guarded the existing awaitingResync console.warn with optional chaining so the no-console lint rule passes without referencing the member in the guard condition ## Tests - bridge.test.ts: prompt_cancelled attribution; session_closed + session_metadata_updated envelope originatorClientId - eventBus.test.ts: resync + replay paths assert the trailing replay_complete sentinel (replayedCount = actual delivered frames) - daemonUi.test.ts: normalize prompt_cancelled / replay_complete (incl. empty-ring zero count); reducer cancellation propagation; replay no-op ## Validation | | | |---|---| | acp-bridge tests | all pass | | SDK tests | 637/637 | | SDK + bridge typecheck | clean | | webui consumer typecheck | clean | ## Deferred (docs/qwen-daemon/cross-client-sync-followups.md) Ghost-echo-on-forward-failure; in-session ACP setModel bus emit; approval-mode workspace broadcast + serialization; permission_resolved voter semantics.
Review addressed —
|
| Item | Action |
|---|---|
Abort-path cancel doesn't emit prompt_cancelled |
Extracted shared broadcastPromptCancelled; now called from both cancelSession AND the sendPrompt onAbort closure (originator SSE disconnect — the common production cancel path) |
SDK doesn't recognize prompt_cancelled / replay_complete |
Added both to DAEMON_KNOWN_EVENT_TYPE_VALUES + normalizer cases → typed UI events prompt.cancelled / session.replay_complete; new event interfaces + union + barrel exports; reducer runs cancellation-propagation on prompt.cancelled, no-op on replay_complete; terminal projection cases. (Scope of this PR widened from acp-bridge to daemon+sdk — retitled accordingly, since the whole point is making the client-side info-sync observable end-to-end.) |
state_resync_required tests not updated for sentinel |
The resync + replay tests now assert the trailing replay_complete frame; replayedCount asserted = actual delivered frames (not the evicted-hole arithmetic) |
| Missing bridge-level integration tests | Added bridge.test.ts cases: prompt_cancelled originator attribution; session_closed + session_metadata_updated envelope-level originatorClientId (the existing echo + multi-modal tests already covered user_message_chunk) |
Unnecessary (req as { prompt?: unknown }) cast |
Read req.prompt directly — future SDK type change now surfaces as a compile error |
| Unbounded content-block iteration | Cap at MAX_ECHO_CONTENT_BLOCKS (256) |
| Misleading non-text comment | Corrected — all ContentBlock variants are published verbatim today |
Deferred (with rationale)
- Ghost echo on prompt-forward failure → introducing a new
prompt_errorevent type is scope creep for this PR. Tracked as a follow-up. prompt_cancelledordering / unconditional publish → kept the "broadcast before forward" ordering (peers learn promptly even if the agent is slow) and documented the "cancel requested, not confirmed" semantic inline. Did NOT gate onactivePromptOriginatorClientId— that field is only set for prompts that carried an originator, so gating would drop the broadcast for anonymous active prompts. A cancel against a genuinely idle session is a harmless idempotent no-op for consumers.
The deferred items (plus several other cross-client gaps surfaced in the audit — in-session setModel bus emit, approval-mode workspace broadcast + serialization, permission_resolved voter semantics) are catalogued for batch follow-up.
Validation
| acp-bridge tests | all pass |
| SDK tests | 637/637 |
| SDK + bridge typecheck | clean |
| webui consumer typecheck | clean (no downstream break) |
Head: c8375e6b3.
Review follow-up: the existing `prompt_cancelled` test only exercised the explicit `cancelSession` route. The `onAbort` path (originator SSE disconnect — tab close / network drop / laptop sleep, the most common production cancel trigger) had no test asserting the broadcast reaches peer subscribers. A future refactor dropping the `broadcastPromptCancelled` call from `onAbort` would have passed silently and re-opened the cross-client gap. New test: hangs the prompt via a non-resolving `promptImpl`, attaches a peer subscriber, aborts the originator's `sendPrompt` signal mid-flight, and asserts the peer receives `prompt_cancelled` with the originator's `clientId`. Releases the hung prompt before shutdown. acp-bridge: 183/183 pass.
doudouOUC
left a comment
There was a problem hiding this comment.
Inline review for cross-client sync completeness — 1 High / 3 Medium / 2 Low. The five fixes themselves are correct and well-documented; comments below flag edge cases worth resolving (in this PR or as tracked follow-ups). Not blocking — the deferred items from the original audit are scoped well and chiga0's responses to wenshao's first round closed the substantive gaps.
— claude-opus-4-7 via Claude Code /qreview
| queue.forcePush({ | ||
| v: EVENT_SCHEMA_VERSION, | ||
| type: 'replay_complete', | ||
| data: { | ||
| ...(lastReplayedId !== undefined | ||
| ? { lastEventId: lastReplayedId } | ||
| : {}), | ||
| replayedCount, | ||
| }, | ||
| }); |
There was a problem hiding this comment.
[High] replay_complete confidently reports "caught up" in a scenario where the consumer is actually stale.
When opts.lastEventId !== undefined AND the ring is empty (e.g., the daemon was restarted and the session bus was re-created with nextId=1, but the consumer is reconnecting with lastEventId=N from the previous daemon lifetime):
earliestInRingisundefined→ the guard at line 391-392 short-circuits → nostate_resync_requiredis emitted- The for loop is a no-op (empty ring) →
replayedCount = 0 replay_complete { replayedCount: 0 }is force-pushed → SDK consumer drops the catch-up indicator
The consumer's reducer state is fully stale, but replay_complete says "all good." Pre-PR, the consumer at least had the heuristic spinner-timeout to hint at suspicion; with the new sentinel, the lie is deterministic and silent.
This is technically a pre-existing state_resync_required gap (the empty-ring case was already untested for resync detection), but the new sentinel makes the misreport more authoritative. Worth either:
- Adding
noRingHistory: true(or similar) toreplay_complete.datawhenearliestInRing === undefinedandopts.lastEventId !== undefined, so consumers can opt to force aloadSession, OR - Emitting
state_resync_requiredin this branch too (reason:'ring_empty_on_resume') before the sentinel.
Option 2 is more invasive but closes the gap end-to-end; option 1 is one extra field and lets the SDK decide. Can be a tracked follow-up but I'd surface it before merge so the gap doesn't get buried.
| // silent-absence-of-chunks state this work set out to fix. | ||
| // `originatorClientId` here is the prompt's own originator (the | ||
| // client whose connection dropped). | ||
| broadcastPromptCancelled(entry, sessionId, originatorClientId); |
There was a problem hiding this comment.
[Medium] Double prompt_cancelled broadcast when a client both calls cancelSession AND drops its SSE connection.
Scenario: a buggy or impatient client POSTs /cancel and then closes the SSE tab/socket before the cancel response lands. Both code paths fire:
cancelSession(line 2145) → 1×prompt_cancelledonAborthere (line 2079) → 1× moreprompt_cancelled
The reducer's propagateCancellationToInFlightTools is idempotent (verified — re-runs over the same in-flight tool set are no-ops), so transcript state is safe. But:
- Peer SSE subscribers see two
prompt_cancelledframes on the wire → potential double-render of cancel banners in naive UIs - The deferred follow-up about
permission_resolvedoriginator-vs-voter inconsistency probably wants symmetric attention here too
Options:
- Gate the abort-path broadcast on
entry.activePromptOriginatorClientId !== undefinedAND "no prior cancel" sentinel (e.g., set a flag onentrythatcancelSessionandonAbortboth check + flip) - Document this as expected and let the SDK dedup
prompt_cancelledframes within a short window - Move the broadcast into
entry.connection.cancel(...)'s success callback so there's exactly one path emitting it (changes the "requested vs confirmed" semantic, though)
If the call is to keep both paths broadcasting, an inline comment here noting the duplicate-fire scenario would help oncall.
| update: { | ||
| sessionUpdate: 'user_message_chunk', | ||
| content: part, | ||
| _meta: { serverTimestamp, source: 'bridge-echo' }, |
There was a problem hiding this comment.
[Medium] _meta is placed inside the ACP update object alongside spec-defined fields (sessionUpdate, content), not on the envelope.
update: {
sessionUpdate: 'user_message_chunk',
content: part,
_meta: { serverTimestamp, source: 'bridge-echo' },
},The qwen SDK tolerates this (normalizer ignores unknown fields on update), but any strict third-party ACP consumer that validates with additionalProperties: false against the ACP SessionNotification.update schema would reject the frame. ACP's published schema is the contract between daemon and arbitrary ACP clients, not just our own SDK.
The envelope (BridgeEvent) is the bridge's own type — putting _meta there is safe; putting it inside update is risky.
Suggestion: hoist to envelope level. Then originatorClientId, _meta.source, and _meta.serverTimestamp are all colocated as bridge-extension fields, and the update payload stays spec-clean:
entry.events.publish({
type: 'session_update',
data: {
sessionId: req.sessionId,
update: { sessionUpdate: 'user_message_chunk', content: part },
},
_meta: { serverTimestamp, source: 'bridge-echo' },
...(originatorClientId ? { originatorClientId } : {}),
});Non-blocking if there are no current third-party ACP consumers, but worth a decision before merge — moving later is a wire-format break.
| type: 'replay_complete', | ||
| data: { | ||
| ...(lastReplayedId !== undefined | ||
| ? { lastEventId: lastReplayedId } |
There was a problem hiding this comment.
[Medium] Wire field name lastEventId overlaps with the SSE protocol's Last-Event-ID (which here lives on the envelope id).
The SDK normalizer already renames this to lastReplayedEventId (normalizer.ts:163) — but on the wire / in raw daemon traces, an oncall debugging SSE will see data.lastEventId: 42 and instinctively cross-reference it with the SSE protocol's Last-Event-ID header, which is a different thing entirely (envelope id of the LAST DELIVERED frame, not the last id within the replay batch).
Suggest renaming the wire field to lastReplayedId for parity with the SDK side. Cost: one rename here + the normalizer line that reads numberField(event.data, 'lastEventId'). Cheap to do before merge, expensive after deployment.
| _meta: { serverTimestamp, source: 'bridge-echo' }, | ||
| }, | ||
| }, | ||
| ...(originatorClientId ? { originatorClientId } : {}), |
There was a problem hiding this comment.
[Low] When originatorClientId is undefined (anonymous prompt — no X-Qwen-Client-Id header, e.g., curl smoke-tests or pre-clientId-registration scripts), the envelope omits the field entirely.
The SDK's suppressOwnUserEcho dedup at normalizer.ts:336-340 matches by event.originatorClientId === opts.clientId. With no envelope-level field, the dedup never fires, so even the originator's own UI will double-render its prompt (once via local setState, once via this echo).
In practice production clients always carry a clientId, so this only bites anonymous-clientId callers. But worth a one-line comment near the conditional spread:
// Anonymous prompts (no originatorClientId) cannot be deduped by
// `suppressOwnUserEcho` — originator will see their own input echoed.
// Acceptable because anonymous flow is mostly smoke-tests / curl.
...(originatorClientId ? { originatorClientId } : {}),Not a code change, just a doc-rot prevention.
| // makes it optional surfaces as a TypeScript error rather than being | ||
| // silently swallowed by an `unknown` cast. | ||
| const prompt = req.prompt; | ||
| if (prompt.length === 0) return; |
There was a problem hiding this comment.
[Low] prompt.length access has no nullish guard.
The JSDoc above states PromptRequest.prompt is non-optional per the ACP type contract, and TypeScript would catch a prompt?: ContentBlock[] field downgrade at compile time. So this is safe assuming the type contract is enforced at the HTTP route boundary (i.e., the route validates the body before constructing PromptRequest).
If there's any route path that constructs a PromptRequest from a partially-validated body (e.g., for retry / shadow / test injection), req.prompt === undefined would crash with Cannot read properties of undefined (reading 'length') here, which would surface as a confusing 500 from the prompt route rather than a clear 400.
Minimal-cost defense:
const prompt = req.prompt;
if (!prompt || prompt.length === 0) return;Really a Low — the existing JSDoc is doing the right thing — but the one-character change has zero cost.
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
|
LGTM✅~~ |
wenshao
left a comment
There was a problem hiding this comment.
No review findings. LGTM! ✅ — qwen3.7-max via Qwen Code /review
|
Thanks @doudouOUC for the thorough pass — and for the approve. These landed after merge, so triaging them as follow-ups (recorded in [High] D1 —
|
- Concurrent-in-session-/model drift (Critical): add §2.2 post-roundtrip reconciliation — on roundtrip settle the bridge re-reads the agent's actual model and emits a corrective model_switched if divergent (in-session /model bypasses modelChangeQueue, so drop-when-suppressed could otherwise leave the bus on A while the session runs B). - IDE-companion lockstep (Critical): add a one-release dual-emit transition (publish both generic session_update and promoted approval_mode_changed) and enumerate the upstream dispatch sites (daemonIdeConnection.ts / DaemonChannelBridge.ts) that drop unknown types and also need handlers. - Specify the model_switched payload mapping (currentModelId→data.modelId, envelope sessionId→data.sessionId) — without it the SDK validator drops the promoted event and A1 is non-functional. - Require demux observability (structured log: promoted/dropped/suppressed/ generic) at every decision point. - Correct the reviewer's "replay_complete doesn't exist": it shipped in merged #4484 (eventBus.ts:444); A5 phase 2 introduces only session_snapshot. - First-attach no longer synthesizes replay_complete{0} (would widen that event's contract); session_snapshot is self-delimiting on first attach. - Tighten capture-at-emission to a synchronous read+publish block. - Specify the helper-generalization migration model; resolve Q3 (keep the extMethod bypass); add the A4 distinguishing test (done in #4539) to §8.
Squashed feature work from daemon_mode_b_main branch, rebased onto latest main to establish proper merge-base and clean PR diff. Original commits: - perf(core): F2 cleanup PR A — R9/W11/W12/R10 (post-merge follow-ups) (#4411) - refactor(acp-bridge): F1 test split — lift bridge.test.ts (6861 LOC) to acp-bridge (#4445) - fix(core): F2 cleanup PR B — self-heal observability (W133-a + W134) (#4460) - feat(sdk/daemon-ui): unified completeness follow-up to #4328 (#4353) - docs(serve): v0.16-alpha known limits + SDK QWEN_SERVER_TOKEN env fallback (PR 27) (#4473) - docs(deploy): local launch templates for v0.16-alpha (PR 30a) (#4483) - feat(daemon+sdk): cross-client real-time sync completeness (#4484) - feat(serve): add POST /session/:id/recap (#4504) - feat(daemon): add voterClientId to permission_resolved (A4) (#4539) - feat(serve): --allow-origin <pattern> CORS allowlist (T2.4 #4514) (#4527) - feat(daemon): in-session model switch reaches the bus (A1) (#4546) - feat(serve): prompt absolute deadline + SSE writer idle timeout (#4514 T2.9) (#4530) - Feat/daemon react cli (#4380)
* feat(acp-bridge): cross-client real-time sync completeness (5 fixes) Audit (cross-client sync, 2026-05-24) of the daemon's per-session EventBus fan-out surfaced gaps where one client's actions did not propagate to other SSE-subscribed clients on the same session. This commit closes five of them — all bridge-layer fixes, no agent-side changes — with regression tests covering the new sentinel frame. ## 1. user_message_chunk echo on the interactive prompt path The agent's `Session#executePrompt` (Session.ts:556+) forwards the prompt straight to the LLM without emitting `user_message_chunk` to the session bus. The cron path (Session.ts:1402) and HistoryReplayer (HistoryReplayer.ts:65) DO emit it; only the interactive path was the outlier. Result: when client A sent a prompt, other clients on the same session saw only the agent's reply, never the input — they had to wait for a session reload to learn what A had asked. Fix: `echoPromptToSessionBus` helper publishes one `user_message_chunk` per content block of the incoming `PromptRequest`, stamped with the envelope-level `originatorClientId` so SDK consumers with `suppressOwnUserEcho: true` filter the echo on the originator's UI. Multi-modal blocks (image / audio / resource) pass through verbatim for future-compat with Core's multi-modal echo work. `_meta.source: 'bridge-echo'` distinguishes bridge-synthesized echoes from agent-emitted content. Used today only for diagnostic visibility; becomes load-bearing once SDK-side dedup matures (deferred follow-up). ## 2. prompt_cancelled broadcast in cancelSession `bridge.cancelSession` forwarded the ACP cancel notification to the agent and resolved pending permissions, but did NOT publish any event on the session bus. Other clients learned that A had cancelled only by absence of further `agent_message_chunk` frames — heuristic and late. Fix: emit a `prompt_cancelled` envelope before the ACP forward so peer clients see the cancel as a first-class event. Envelope-level `originatorClientId` identifies the cancelling client (the one calling `POST /cancel`). Permission-resolution events generated by the subsequent `cancelPendingForSession` continue to omit an originator (those are system-initiated wind-downs, not user-voted). ## 3. replay_complete sentinel in EventBus.subscribe A consumer attaching via `Last-Event-ID: <n>` had no positive signal when the replay loop drained — they had to heuristically time out the catch-up spinner. The state-resync path already had a synthetic `state_resync_required` frame; the success path lacked parity. Fix: emit an id-less `replay_complete` synthetic frame at the end of the replay loop (same pattern as `client_evicted` / `state_resync_required` — no slot in the per-session monotonic sequence). Fires both when replay actually delivered frames AND when there was nothing to replay (empty ring), so the consumer always sees the transition from "catching up" to "live". `data.replayedCount` is the actual count of force-pushed frames (not derived from id arithmetic, which would over-count when the state-resync path leaves a hole before the ring's earliest id). 3 EventBus test cases updated to assert the sentinel frame ordering. ## 4. originatorClientId on session_metadata_updated envelope `updateSessionMetadata` resolved the trusted client id for validation (`resolveTrustedClientId(entry, context.clientId)`) but did not stamp it on the broadcast envelope. UIs couldn't attribute the rename to a specific client. Sibling events (`model_switched`, `approval_mode_changed`) all stamp envelope-level `originatorClientId`; this brings the metadata broadcast to parity. ## 5. originatorClientId on session_closed envelope `session_closed` carried the closing client in `data.closedBy` only, but every other event the bridge publishes uses the envelope-level `originatorClientId` field. Added the envelope-level stamp (kept `data.closedBy` for back-compat) so SDK consumers can read the attribution from the same place across all event types. ## Out-of-scope (deferred to follow-up) The cross-client sync audit also surfaced 3 items that require larger design discussion: - **In-session ACP `setModel` bus emit** — `Session.ts#setModel` calls `config.switchModel` directly without going through the bridge's publish path. Fixing this requires a new ACP sessionUpdate type (`current_model_update`, parallel to existing `current_mode_update`) or a side-channel callback from agent to bridge. - **Workspace-wide broadcast of non-persisted approval-mode changes** — current behavior only broadcasts workspace-wide on `persist=true`; the design intent of the persist flag relative to multi-client visibility needs alignment. - **Serialize `setSessionApprovalMode` through a queue** — analogous to `entry.modelChangeQueue` for `setSessionModel`. Race-condition fix. - **Reconcile `permission_resolved.originatorClientId` semantics** — it currently carries the VOTER's clientId; `permission_request` carries the prompt originator. SDK consumers need to special-case the type. Either change to consistent semantics or add a separate `voterClientId` field. These are tracked as follow-ups, not in this PR. ## Validation | | | |---|---| | Bridge tests | 291/291 pass | | eventBus tests | 105/105 pass (3 updated) | | TypeScript | clean | * test(acp-bridge): multi-client user_message_chunk echo coverage Adds two integration tests for the cross-client sync fix: - "echoes user_message_chunk to ALL session subscribers": two SSE subscribers (A + B) on the same session; client A sends a prompt; asserts BOTH receive the user_message_chunk with the originator stamp + `_meta.source: 'bridge-echo'`. This is the core multi-client property — a prompt from one client is visible to every subscriber, not just the originator. - "echoes one user_message_chunk per content block (multi-modal)": a two-block prompt (text + resource_link) produces two echo frames in order. Validates the bridge-layer echo end-to-end through the real EventBus + subscribeEvents path, not just a unit of the helper. * feat(daemon+sdk): address review — abort-path cancel, SDK recognition, hardening Round-2 review of the cross-client sync work. Adds the sibling cancel path, SDK-side recognition of the two new event types so consumers can react instead of debug-dropping, plus hardening + test coverage flagged in review. ## Bridge (acp-bridge) - Abort-path cancel broadcast: the `sendPrompt` `onAbort` closure (originator SSE disconnect — the most common cancel trigger: tab close, network drop, laptop sleep) previously resolved permissions + forwarded ACP cancel WITHOUT publishing `prompt_cancelled`. Only the explicit `cancelSession` route emitted it. Extracted a shared `broadcastPromptCancelled` helper, called from both paths. - echoPromptToSessionBus hardening: read `req.prompt` directly (no `unknown` cast so a future SDK type change is a compile error); cap echoed blocks at MAX_ECHO_CONTENT_BLOCKS (256) to bound fan-out + ring pressure; corrected the non-text comment (all ContentBlock variants are published verbatim, not "metadata-only"). - Documented prompt_cancelled's "cancel requested, not confirmed" semantic and the intentional unconditional broadcast. ## SDK (sdk-typescript) The bridge now produces `prompt_cancelled` and `replay_complete`. Without SDK recognition they fall through the normalizer default to `debug` and the reducer drops them — consumers (VSCode ext, web UI, React CLI) can't react. Added: - both types to DAEMON_KNOWN_EVENT_TYPE_VALUES - normalizer cases → typed UI events `prompt.cancelled` / `session.replay_complete` - DaemonUiPromptCancelledEvent + DaemonUiReplayCompleteEvent types, union + barrel re-exports - reducer: prompt.cancelled runs propagateCancellationToInFlightTools (clears peer-cancelled tool spinners, same idempotent path as assistant.done(cancelled)); session.replay_complete no-ops on blocks - terminal projection cases for both - guarded the existing awaitingResync console.warn with optional chaining so the no-console lint rule passes without referencing the member in the guard condition ## Tests - bridge.test.ts: prompt_cancelled attribution; session_closed + session_metadata_updated envelope originatorClientId - eventBus.test.ts: resync + replay paths assert the trailing replay_complete sentinel (replayedCount = actual delivered frames) - daemonUi.test.ts: normalize prompt_cancelled / replay_complete (incl. empty-ring zero count); reducer cancellation propagation; replay no-op ## Validation | | | |---|---| | acp-bridge tests | all pass | | SDK tests | 637/637 | | SDK + bridge typecheck | clean | | webui consumer typecheck | clean | ## Deferred (docs/qwen-daemon/cross-client-sync-followups.md) Ghost-echo-on-forward-failure; in-session ACP setModel bus emit; approval-mode workspace broadcast + serialization; permission_resolved voter semantics. * test(acp-bridge): cover prompt_cancelled on the sendPrompt abort path Review follow-up: the existing `prompt_cancelled` test only exercised the explicit `cancelSession` route. The `onAbort` path (originator SSE disconnect — tab close / network drop / laptop sleep, the most common production cancel trigger) had no test asserting the broadcast reaches peer subscribers. A future refactor dropping the `broadcastPromptCancelled` call from `onAbort` would have passed silently and re-opened the cross-client gap. New test: hangs the prompt via a non-resolving `promptImpl`, attaches a peer subscriber, aborts the originator's `sendPrompt` signal mid-flight, and asserts the peer receives `prompt_cancelled` with the originator's `clientId`. Releases the hung prompt before shutdown. acp-bridge: 183/183 pass. --------- Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
Summary
A cross-client real-time sync audit of
daemon_mode_b_main(2026-05-24) surfaced eight gaps where one client's actions did not propagate to other SSE-subscribed clients on the same session. This PR closes the five that are bridge-layer mechanical fixes —user_message_chunkecho on the prompt path,prompt_cancelledbroadcast on cancel, areplay_completesentinel for Last-Event-ID resume, and envelope-leveloriginatorClientIdonsession_metadata_updated+session_closed. The remaining three (in-session ACPsetModelbus emit, workspace-wide non-persisted approval-mode broadcast,permission_resolvedoriginator-vs-voter semantics) need larger design alignment and are tracked as separate follow-ups.What this PR delivers
1. `user_message_chunk` echo on the interactive prompt path
The agent's `Session#executePrompt` forwards the prompt straight to the LLM without emitting `user_message_chunk` to the session bus. The cron path (`Session.ts:1402`) and `HistoryReplayer` (`HistoryReplayer.ts:65`) DO emit it; only the interactive path was the outlier. Result: when client A sent a prompt, other clients on the same session saw only the agent's reply — never the input — until a session reload.
Fix: new `echoPromptToSessionBus` helper in `bridge.ts` publishes one `user_message_chunk` per content block of the incoming `PromptRequest`, stamped with envelope-level `originatorClientId` so SDK consumers with `suppressOwnUserEcho: true` filter the echo on the originator's own UI. Multi-modal blocks pass through verbatim for future-compat with Core's multi-modal echo work (PR #4353 §D).
`_meta.source: 'bridge-echo'` distinguishes bridge-synthesized echoes from agent-emitted content.
2. `prompt_cancelled` broadcast in cancelSession
`bridge.cancelSession` forwarded the ACP cancel notification to the agent and resolved pending permissions, but did NOT publish any event on the session bus. Other clients learned that A had cancelled only by absence of further `agent_message_chunk` frames — heuristic and late.
Fix: emit a `prompt_cancelled` envelope before the ACP forward so peer clients see the cancel as a first-class event. Envelope-level `originatorClientId` identifies the cancelling client.
3. `replay_complete` sentinel in EventBus.subscribe
A consumer attaching via `Last-Event-ID: ` had no positive signal when the replay loop drained — they had to heuristically time out the catch-up spinner. The state-resync path already had a synthetic `state_resync_required` frame; the success path lacked parity.
Fix: emit an id-less `replay_complete` synthetic frame at the end of the replay loop. Fires both when replay actually delivered frames AND when there was nothing to replay (empty ring). `data.replayedCount` is the actual count of force-pushed frames (not derived from id arithmetic, which would over-count when the state-resync path leaves a hole before the ring's earliest id).
4. `originatorClientId` on `session_metadata_updated` envelope
`updateSessionMetadata` resolved the trusted client id for validation but did not stamp it on the broadcast envelope. Sibling events (`model_switched`, `approval_mode_changed`) all stamp envelope-level `originatorClientId`; this brings metadata broadcast to parity.
5. `originatorClientId` on `session_closed` envelope
`session_closed` carried the closing client in `data.closedBy` only, but every other event uses the envelope-level `originatorClientId` field. Added the envelope stamp; kept `data.closedBy` for back-compat.
Out of scope (deferred to follow-ups)
The audit also surfaced 3 items that require larger design discussion. Tracked as separate follow-up PRs:
Deployment coordination note
Downstream products that ship their own gateway with a user-message echo workaround (e.g., `web-terminal-sandboxs` `qwen-gateway`) will produce double frames once this daemon-side echo lands. Recommended rollout: deploy daemon first with the new echo, then flip the gateway's `GATEWAY_ECHO_USER_MESSAGE=false` env flag. The `_meta.source` marker (`'bridge-echo'` from daemon vs `'gateway-echo'` from gateway) lets future SDK-side dedup catch leftover misconfiguration.
Validation
```bash
cd packages/acp-bridge
npx vitest run # 291/291 pass (3 updated for replay_complete)
npx tsc --noEmit # clean
```
Linked